Lexical idiosyncrasy in MWE extraction

نویسندگان

  • Csaba Oravecz
  • Viktor Nagy
چکیده

A wide scale of different NLP methods have been investigated for the extraction of Multiword Expressions from large corpora. While a good deal of recent research has been focusing on the development of reliable means to delineate different subclasses of MWEs with respect to the degree of their compositionality (Baldwin et al., 2003; McCarthy et al., 2003), it has been generally accepted that for the "simple" task of separating MWEs from fully productive word combinations, the substitutability of component words in a multiword unit with semantic neighbours could be a good indicative measure (Bannard et al., 2003). The underlying assumption is that MWEs do not generally tolerate the replacement of their components with semantically similar items. (Let us call this phenomenon lexical idiosyncrasy.) If we could represent this substitutability by some ranking measure, we will have reliable information whether a word combination could be considered a multiword unit or not (Pearce, 2001).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A System for Compound Noun Multiword Expression Extraction for Hindi

Compound noun multiword expressions are important for many NLP applications like machine translation and information retrieval. This paper describes a system for Hindi compound noun multiword expressions (MWE) extraction from a given corpus. We identify major categories of compound noun MWEs, based on linguistic and psycholinguistic principles. Our extraction methods use various statistical co-...

متن کامل

Modeling the Statistical Idiosyncrasy of Multiword Expressions

The focus of this work is statistical idiosyncrasy (or collocational weight) as a discriminant property of multiword expressions. We formalize and model this property, compile a 2-class dataset of MWE and non-MWE examples, and evaluate our models on this dataset. We present a possible empirical implementation of collocational weight and study its effects on identification and extraction of MWEs...

متن کامل

Examining the Effect of Ideology and Idiosyncrasy on Lexical Choices in Translation Studies within the CDA Framework

Using a critical discourse analytic model of translation criticism, the present study attempts to explore the effect of ideology and idiosyncrasy on the lexical choices in translation studies. The study employed a descriptive approach to answer two research questions: Is there any relationship between ideology and idiosyncratic features of translators' lexical choices? And if yes, can it be ana...

متن کامل

Predicting the Compositionality of Multiword Expressions Using Translations in Multiple Languages

In this paper, we propose a simple, languageindependent and highly effective method for predicting the degree of compositionality of multiword expressions (MWEs). We compare the translations of an MWE with the translations of its components, using a range of different languages and string similarity measures. We demonstrate the effectiveness of the method on two types of English MWEs: noun comp...

متن کامل

Project proposal Automatic extraction and evaluation of MWE: adapting method to French Language Technology: Research and Development

Our project is based on the theme of Multi Word Expressions (MWE) we will focus on the problem of extraction. This task is important for improving lexical resources used for tasks such as tokenization, parsing or translation. In our study we will work on a French corpus. Our aim will be to not only select but also validate automatically which candidates are the true ones. If we have time we wil...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005